Using Deep features for image retrieval

Import Graphlabe


In [4]:
import graphlab as gl

Laod the CIFAR-10 dataset


In [7]:
image_train = gl.SFrame('image_train_data/')
image_train.head()

Train a NN model for retrieving images using deep features


In [5]:
knn_model = gl.nearest_neighbors.create(image_train, features=['deep_features'],
                                       label = 'id')


PROGRESS: Starting brute force nearest neighbors model training.

Use image retrieval model with deep features to find similar images


In [13]:
gl.canvas.set_target('ipynb')
cat = image_train[18:19]
cat['image'].show()



In [8]:
knn_model.query(cat)


PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0            | 1       | 0.0498753   | 8.572ms      |
PROGRESS: | Done         |         | 100         | 240.553ms    |
PROGRESS: +--------------+---------+-------------+--------------+
Out[8]:
query_label reference_label distance rank
0 384 0.0 1
0 6910 36.9403137951 2
0 39777 38.4634888975 3
0 36870 39.7559623119 4
0 41734 39.7866014148 5
[5 rows x 4 columns]

In [11]:
def get_images_from_ids(query_result):
    return image_train.filter_by(query_result['reference_label'], 'id')

In [14]:
cat_neighbors = get_images_from_ids(knn_model.query(cat))


PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0            | 1       | 0.0498753   | 16.306ms     |
PROGRESS: | Done         |         | 100         | 261.317ms    |
PROGRESS: +--------------+---------+-------------+--------------+

In [19]:
cat_neighbors['image'].show()



In [20]:
car = image_train[8:9]

In [21]:
car['image'].show()


Get similar car


In [22]:
get_images_from_ids(knn_model.query(car))['image'].show()


PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0            | 1       | 0.0498753   | 12.773ms     |
PROGRESS: | Done         |         | 100         | 279.012ms    |
PROGRESS: +--------------+---------+-------------+--------------+

Lambda function to show nearest neighbor images


In [23]:
show_neighbors = lambda i:get_images_from_ids(knn_model.query(image_train[i:i+1]))['image'].show()

In [26]:
show_neighbors(19)


PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0            | 1       | 0.0498753   | 12.833ms     |
PROGRESS: | Done         |         | 100         | 251.998ms    |
PROGRESS: +--------------+---------+-------------+--------------+

In [28]:
show_neighbors(1222)


PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0            | 1       | 0.0498753   | 19.876ms     |
PROGRESS: | Done         |         | 100         | 247.165ms    |
PROGRESS: +--------------+---------+-------------+--------------+

In [30]:
show_neighbors(2000)


PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0            | 1       | 0.0498753   | 16.089ms     |
PROGRESS: | Done         |         | 100         | 245.259ms    |
PROGRESS: +--------------+---------+-------------+--------------+

Computing summary statistics of the data


In [33]:
image_train['label'].sketch_summary()


Out[33]:
+------------------+-------+----------+
|       item       | value | is exact |
+------------------+-------+----------+
|      Length      |  2005 |   Yes    |
| # Missing Values |   0   |   Yes    |
| # unique values  |   4   |    No    |
+------------------+-------+----------+

Most frequent items:
+-------+------------+-----+-----+------+
| value | automobile | cat | dog | bird |
+-------+------------+-----+-----+------+
| count |    509     | 509 | 509 | 478  |
+-------+------------+-----+-----+------+

In [ ]:

Creating category-specific image retrieval models


In [35]:
dog_image_train = image_train[image_train['label']=='dog']

In [69]:
cat_image_train = image_train[image_train['label']=='cat']
##cat_image_train.head()

In [70]:
auto_image_train = image_train[image_train['label']=='automobile']

In [71]:
bird_image_train = image_train[image_train['label']=='bird']

KNN Model creation for each category


In [41]:
dog_model = gl.nearest_neighbors.create(dog_image_train,
                                       features = ['deep_features'],
                                        label = 'id')


PROGRESS: Starting brute force nearest neighbors model training.

In [43]:
image_test = gl.SFrame('image_test_data/')

In [44]:
cat_model = gl.nearest_neighbors.create(cat_image_train,
                                       features = ['deep_features'],
                                        label = 'id')


PROGRESS: Starting brute force nearest neighbors model training.

In [45]:
auto_model = gl.nearest_neighbors.create(auto_image_train,
                                       features = ['deep_features'],
                                        label = 'id')


PROGRESS: Starting brute force nearest neighbors model training.

In [46]:
bird_model = gl.nearest_neighbors.create(bird_image_train,
                                       features = ['deep_features'],
                                        label = 'id')


PROGRESS: Starting brute force nearest neighbors model training.

In [57]:
query_result_cat = cat_model.query(image_test[0:1])
cat_image_train.filter_by(query_result_cat['reference_label'], 'id')['image'].show()


PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0            | 1       | 0.0498753   | 12.346ms     |
PROGRESS: | Done         |         | 100         | 214.92ms     |
PROGRESS: +--------------+---------+-------------+--------------+

In [134]:
query_result_dog = dog_model.query(image_test[0:1])
dog_image_train.filter_by(query_result_dog['reference_label'], 'id')['image'].show()


PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0            | 1       | 0.196464    | 9.347ms      |
PROGRESS: | Done         |         | 100         | 80.029ms     |
PROGRESS: +--------------+---------+-------------+--------------+

In [ ]:

A simple example of nearest-neighbors classification


In [63]:
## mean to 
query_result_cat['distance'].mean()


Out[63]:
36.15573070978294

In [135]:
query_result_dog['distance'].mean()
#query_result_dog


Out[135]:
37.77071136184156

In [ ]:

Computing NN accuracy using SFrame operations

Split the test dataset into different category

In [65]:
cat_test_image = image_test[image_test['label']=='cat']

In [66]:
dog_test_image = image_test[image_test['label']=='dog']

In [72]:
auto_test_image = image_test[image_test['label']=='automobile']
bird_test_image = image_test[image_test['label']=='bird']

Fin the NN in the training set for every for each category

More KNN model


In [78]:
dog_cat_neighbors = cat_model.query(dog_test_image, k=1)
dog_cat_neighbors


PROGRESS: Starting blockwise querying.
PROGRESS: max rows per data block: 7668
PROGRESS: number of reference data blocks: 4
PROGRESS: number of query data blocks: 1
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 1000         | 501000  | 24.9875     | 1.14s        |
PROGRESS: | Done         | 2005000 | 100         | 1.33s        |
PROGRESS: +--------------+---------+-------------+--------------+
Out[78]:
query_label reference_label distance rank
0 49803 33.4773590373 1
1 5755 32.8458495684 1
2 20715 35.0397073189 1
3 13387 33.9010327697 1
4 7493 34.778824791 1
5 6094 34.945165344 1
6 3431 39.0957278345 1
7 6184 37.7696131032 1
8 2167 35.1089144603 1
9 44673 42.7258732951 1
[1000 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [80]:
dog_auto_neighbors = auto_model.query(dog_test_image, k=1)


PROGRESS: Starting blockwise querying.
PROGRESS: max rows per data block: 7668
PROGRESS: number of reference data blocks: 4
PROGRESS: number of query data blocks: 1
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 1000         | 501000  | 24.9875     | 1.18s        |
PROGRESS: | Done         | 2005000 | 100         | 1.38s        |
PROGRESS: +--------------+---------+-------------+--------------+

In [81]:
dog_bird_neighbors = bird_model.query(dog_test_image, k=1)


PROGRESS: Starting blockwise querying.
PROGRESS: max rows per data block: 7668
PROGRESS: number of reference data blocks: 4
PROGRESS: number of query data blocks: 1
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 1000         | 502000  | 25.0374     | 1.14s        |
PROGRESS: | Done         | 2005000 | 100         | 1.47s        |
PROGRESS: +--------------+---------+-------------+--------------+

In [83]:
dog_dog_neighbors = dog_model.query(dog_test_image, k=1)


PROGRESS: Starting blockwise querying.
PROGRESS: max rows per data block: 7668
PROGRESS: number of reference data blocks: 4
PROGRESS: number of query data blocks: 1
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 1000         | 127000  | 24.9509     | 432.802ms    |
PROGRESS: | Done         | 509000  | 100         | 616.962ms    |
PROGRESS: +--------------+---------+-------------+--------------+

Dog_distances SFrame


In [84]:
dog_distances = gl.SFrame({'dog_cat':dog_cat_neighbors['distance'],
                          'dog_auto': dog_auto_neighbors['distance'],
                          'dog_bird':dog_bird_neighbors['distance'],
                          'dog_dog':dog_dog_neighbors['distance']})

In [85]:
dog_distances.head()


Out[85]:
dog_auto dog_bird dog_cat dog_dog
33.4773590373 33.4773590373 33.4773590373 33.4773590373
32.8458495684 32.8458495684 32.8458495684 32.8458495684
35.0397073189 35.0397073189 35.0397073189 35.0397073189
33.9010327697 33.9010327697 33.9010327697 33.9010327697
34.778824791 34.778824791 34.778824791 37.4849250909
34.945165344 34.945165344 34.945165344 34.945165344
39.0957278345 39.0957278345 39.0957278345 39.0957278345
37.7696131032 37.7696131032 37.7696131032 37.7696131032
35.1089144603 35.1089144603 35.1089144603 35.1089144603
42.7258732951 42.7258732951 42.7258732951 43.2422832585
[10 rows x 4 columns]

Computing the number of correct prediction


In [125]:
def is_dog_correct(row):
    if row['dog_auto']< row['dog_dog']:
        return 0
    elif row['dog_bird']< row['dog_dog']:
        return 0
    elif row['dog_cat']< row['dog_dog']:
        return 0
    else:
        return 1

In [ ]:

Using the magic of .apply()


In [129]:
print dog_distances.apply(is_dog_correct).sum()
print len(dog_distances.apply(is_dog_correct))


678
1000

In [133]:
accuracy = 678/1000.0
print "Accuracy ", accuracy


Accuracy  0.678

In [ ]: